Tuning-Robust Initialization Methods for Speaker Diarization
نویسندگان
چکیده
منابع مشابه
Novel initialization methods for Speaker Diarization
Speaker Diarization is the process of partitioning an audio input into homogeneous segments according to speaker identity where the number of speakers in a given audio input is not known a priori. This master thesis presents a novel initialization method for Speaker Diarization that requires less manual parameter tuning than most current GMM/HMM based agglomerative clustering techniques and is ...
متن کاملRobust Speaker Diarization for meetings
This thesis shows research performed into the topic of speaker diarization for meeting rooms. It looks into the algorithms and the implementation of an offline speaker segmentation and clustering system for a meeting recording where usually more than one microphone is available. The main research and system implementation has been done while visiting the International Computes Science Institute...
متن کاملUnsupervised Methods for Speaker Diarization
Given a stream of unlabeled audio data, speaker diarization is the process of determining “who spoke when.” We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluation, o...
متن کاملFriends and enemies: a novel initialization for speaker diarization
The task of speaker diarization consists of answering the question “Who spoke when?”. The most commonly used approach to speaker diarization is agglomerative clustering of multiple initial clusters. Even though the initial clustering is greatly modified by iterative cluster merging and possibly multiple resegmentations of the data, the initialization algorithm is a key module for system perform...
متن کاملRobust Unsupervised Speaker Segmentation for Audio Diarization
Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Audio, Speech, and Language Processing
سال: 2010
ISSN: 1558-7916,1558-7924
DOI: 10.1109/tasl.2010.2040796